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ABSTRACT 


In this thesis, a new fictitious play (FP) procedure is presented to solve two- 
person zero-sum (TPZS) Blotto games. The FP solution procedure solves TPZS games by 
assuming that the two players take turns selecting optimal responses to the opponent’s 
strategy observed so far. It is known that FP converges to an optimal solution, and it may 
be the only realistic approach to solve large games. The algorithm uses dynamic 
programming (DP) to solve FP subproblems. Efficiency is obtained by limiting the 
growth of the DP state space. 


Blotto games are frequently used to solve simple missile defense problems. While 
it may be unlikely that the models presented in this paper can be used directly to solve 
realistic offense and defense problems, it is hoped that they will provide insight into the 
basic structure of optimal and near-optimal solutions to these important, large games, and 


provide a foundation for solution of more realistic, and more complex, problems. 
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EXECUTIVE SUMMARY 


Fictitious play (FP), first introduced by Brown and Robinson (1951), is an 
iterative procedure used to approximate solutions to two-person zero-sum (TPZS) games. 
At each iteration of FP, each player chooses a pure strategy that is a best reply to the 
mixed strategy represented by the aggregation of all of other player’s pure strategies 
played so far, assuming they will be chosen based on the empirical probability 
distribution induced by their historical frequency in all previous iterations. Fictitious play 


can be thought of as mimicking the behavior of players learning from their opponents. 


The purpose of this thesis is to investigate the use of fictitious play in the solution 
of Blotto games and their generalizations. In Blotto games, opponents each allocate a 
limited number of forces to a specified number of areas. Payoffs in each area accrue to 
the players based on the number of forces assigned to each area. The main application of 


Blotto games has been the analysis of missiles attack and defense problems. 


In this thesis a new fictitious play (FP) procedure is presented to solve two-person 
zero-sum (TPZS) Blotto games. The FP solution procedure solves TPZS games by 
assuming that the two players take turns selecting optimal responses to the opponent’s 
strategy observed so far. It is known that FP converges to an optimal solution, and it may 
be the only realistic approach to solve large games. The algorithm we develop uses 
dynamic programming (DP) to solve the FP subproblems. Efficiency is obtained by 
limiting the growth of the DP state space. We derive the dynamic programming 
recurrence relation for solving Blotto games with a general payoff function using 
fictitious play. The recurrence can be solved without explicitly keeping track of every 
attack or defense played; rather, the information required is simply the number of times a 
given force level (number of attackers or defenders) has been used in each area, over all 
attacks and defenses seen so far. Although our experiments considered one type of 
attacker and one type of defender, we indicate how to generalize this procedure to cases 


with more than one type of attacker or defender (or both). 


xiii 


During this study, we identified other topics for further investigations. The first is 
to investigate generalizations of Blotto games in which defenders can be placed in such a 
way as to defend multiple areas at once. This is closer to the real situation with missile 
defense. The second is to explore the issue of playability in the ILP formulations. Our 
proposed playability constraint is currently too restrictive; we have provided examples in 
which the optimal solution to the ILP, with the playability constraint, is not equal to the 
value of the game. Further research should explore less restrictive, alternate formulations 
of playability constraints. It is possible (although unlikely) that less restrictive playability 
constraints would also yield more efficiently solvable ILP, making that approach 


competitive with the DP-based procedure for larger problems. 


While it is unlikely that the models presented in this paper can be used directly to 
solve realistic offense and defense problems, it is hoped that they will provide insight into 


the basic structure of optimal and near optimal solutions to these important, large games. 


X1V 


I. INTRODUCTION 


Fictitious play (FP), first introduced by Brown and Robinson (1951), is an 
iterative procedure used to approximate solutions to two-person zero-sum (TPZS) games. 
At each iteration of FP, each player chooses a pure strategy that is a best reply to the 
mixed strategy represented by the aggregation of all of other player’s pure strategies 
played so far, assuming they will be chosen based on the empirical probability 
distribution induced by their historical frequency in all previous iterations. Fictitious play 


can be thought of as mimicking the behavior of players learning from their opponents. 


The purpose of this thesis is to investigate the use of fictitious play in the solution 
of Blotto games and their generalizations. In Blotto games, opponents each allocate a 
limited number of forces to a specified number of areas. Payoffs in each area accrue to 
the players based on the number of forces assigned to each area. The main application of 


Blotto games has been the analysis of missile attack and defense problems. 


This thesis is organized as follows: in Chapter II, we introduce TPZS games, 
Blotto games and the solution procedure. In Chapter III, we solve various versions of the 


problem using FP. Chapter IV provides conclusions and suggests further work. 
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Il. TWO PERSON ZERO-SUM GAMES 


A. DEFINITION 

A two-person zero-sum (TPZS) game (von Neumann and Morgenstern, 1944) is a 
situation where there are two players having directly opposite interests. In a TPZS game, 
player 1 (also called X, the row player, or the maximizer) has m pure strategies and player 
2 (Y, column player, minimizer) has n pure strategies. A player can commit to playing a 
pure strategy, or, by randomizing his choice among several pure strategies, he can 
employ a mixed strategy. A mixed strategy is represented by a vector of probabilities of 


choosing each pure strategy. For player 1, we write this vector as: 
We EA Wp ek 


Because x is a vector of probabilities, we have the restrictions that 


and 


Similar notation and restrictions are used for player 2, whose mixed strategy is written as: 


5 eG ener 0 ae 


In a TPZS game, each player chooses a strategy (pure or mixed), unknown to the other, 
and both strategies are revealed simultaneously. The result of the game depends on the 
strategy used by each player. If X and Y choose their i” and j” pure strategies, 
respectively, then the result of game, denoted aj, represents the amount that Y has to pay 
X. Equivalently, the payoffs to X and Y are aj and — aj, respectively. Note that the sum of 
the two payoffs is zero, which explains the name of the game. If two players employ 
mixed strategies x and y (and a pure strategy is just a special case of a mixed strategy), 


then the payoff to player 1 is: 


which can be seen as the expected payoff among all of the pure strategies represented by 


x and y. 


Therefore, a TPZS game is completely defined when the payoff for each pair of X 
and Y pure strategies is determined. These payoffs can be summarized in an mxn matrix, 


generally referred to as a payoff matrix, 1.e. 


A, Ay A, 
a a a 
21 22 2 
A = - 7 ) 
any An? Qin 


and the payoff to player | is then x’ Ay. 
In playing the game, both players are assumed to choose a strategy that achieves 


the most favorable outcome. This means that X would choose the strategy that 


maximizes x’ Ay over all choices of y. On the other hand, Y would choose the strategy 


that minimizes x’ Ay over all choices of x. 


A TPZS game has an equilibrium point when each player can guarantee an 
optimal result by always choosing a single pure strategy. When equilibrium cannot be 


achieved, players must use mixed strategies to optimize the value of the game. 


To choose the best randomized strategy, X must find the x =(x,,...,x,,) to 


max {min[>) x,a, ]: ee =1, x,20, i=1,...,.m}. 

i J j=l i=l 

Similarly, Y must find the y =(),,...,y,) to 
min {max[ > /a,y,]: Sve y, 20, j=l,...,n}. 

jel j=l 


Let x* and y* denote the optimal strategies for X and Y. Then, v* = (x*)"Ay* is the value 


of the game. One of the central results of game theory states (Winston, 1991) that: 


m m 
ve = max {min[>/x/4 ]: beer =1, x,20, i=1,...,.m}, and 
i=l 


i=l 


v= min {max[) ayy; ]: yy ei, y, 290, j=l,...,n}. 
j=l 


j=l 

B. LINEAR PROGRAMMING 
When the payoff matrix is specified and it is not too large, linear programming 
(Winston, 1991) can be used to find the optimal mixed strategies and the value of the 


game. For the maximizer (player 1), the problem is to find the mixed strategy 


X=(X%,,...5% 


m 


n 
) which maximizes min >! x,a, . That is, 
J 


i=l 
LPl: max v 


subject to > 44, -v 20, ic Pee 


Similarly, the minimizer (player 2) must solve LP 2: 
LP2: min w 


subject to > ya; WS, i=l,....m 
j=l 


dy; =1, and y; =],...,n. 


j=l 


It is easy to show that problems LPI and LP2 are duals of each other. Moreover, if 
(v*, x*) and (w*, y*) are optimal to problems LP1 and LP2, respectively, then v*=w*. 
C; FICTITIOUS PLAY (FP): BROWN-ROBINSON METHOD 

Fictitious play (FP) was introduced by Brown and Robinson (1951). It is an 
iterative solution procedure; in each iteration, players choose pure strategies that are the 


best response to the empirical mix of their opponents’ pure strategies seen so far. 


The FP procedure implemented here begins at iteration 1 with player 1 selecting 
that row maximizing the minimum row value, and player 2 selecting that column 


minimizing the maximum column value. Denote the players’ pure strategies at iteration 1 
: 


as x" and y". These are vectors of all zeros except for a 1 at the selected row or column 


locations. At iteration 2, player 1 selects pure strategy x’, which is the best row response 
to y?; and player 2 selects pure strategy y, which is the best column response to x“, 
And for general iteration k > 2, player 1 selects the pure strategy x“), which is the best 


row response to 


player 2 selects pure strategy y“, which is the best column response to 


—(k-1) == 
x =—_)' x) | 


7 k-l p=l 


(k+1) 


(k+l) , =(k) 
For computational purposes, x is conveniently updated from x and x” as follows: 


—(k+1) k ) =) 1 (k4l) 
x =|——|-x +|/——]-x'. 
k+l k+1 


And similarly for player 2, 


—(k+1) k —(k) 1 (k+1) 
=| ——|- +;} ——|- P 
‘ - A) ‘ a) . 


Any limit points of the sequences eae and ees are optimal mixed strategy 


solutions to the game. Also upper and lower bounds on the value of the game, v , are 


determined at each game play. Specifically, at game iteration k, 


—(k-1) . —(k-l) = 
v,=(x y Ay <v <G™Y Ay =V, 


and both v, and Ve converge tov , but not necessarily monotonically (Eagle and 


Washburn, 1991). 


D. BLOTTO GAMES 

1. Definition 

In a Blotto game, there are n=1 targets, or areas, for player 1 (the attacker) to 
attack and player 2 (the defender) to defend. The attacker chooses a vector of allocations 
x, where x; is the number of attacking units assigned to area k, and player | has f attackers 


to distribute, resulting in the constraint ba < f . The defender chooses a vector of 
k=l 


allocations y subject to > y, <g, where y, is the number defenders assigned by player 2 
k=l 


to area k. The payoff is YD A(X 5 Ye) (See Washburn, 1994, pp.107-111 for a more 
k=1 


complete discussion). All allocations are required to be nonnegative, and in a discrete 


Blotto game they are also required to be integers. The number of pure strategies for 
_{(nt+f-l1 n+g-l ; 
player 1 is f and, for player 2, , both of which grow too fast to allow 
& 


complete enumeration in even moderately sized games. Figure 1 displays a typical 


increase in the number of pure strategies for player 1 as f or n increase. 





10 
1<n<100 


' n=10 ' 
)---1Sf<5 100i 


# Pure strategies 





0 10 20 30 40 50 60 70 80 90 100 
# Attackers(f) or Targets(n) 











Figure 1. Number of pure strategies for player 1, for f=10 1<n<100, and for n=10, 


7 


1<fX100. 


2. Playability 
In Blotto games, it is sometimes convenient to represent a mixed strategy with the 
marginal distributions of the random vector X=(Xj,..., X,), where X; is the random 


variable representing the number of attackers assigned to area i. Marginal distributions 


satisfying SX, , = f are playable for the attacker. Similarly, the marginals for the random 
k 
variable Y= (Yj,..., Y,) are playable for the defender if yy = g . However, the typical 
k 


approach is to relax these restrictions and simply require that Yi E(X ,)=f and 
k 


DEM) =8- 


3. ILP Formulation of Blotto Games 

Washburn (1994) presents the LP formulation of Blotto games using the marginal 
distributions. However, those formulations do not guarantee playability (although the 
discussions of those formulations claim that playability is not an issue for large 
problems). We modify the formulation in Washburn (1994) by explicitly adding 
constraints that are sufficient to enforce playability. The cost of such constraints is 
twofold: (1) they are sufficient, but not necessary conditions for playability, so the 
solutions we obtain are potentially suboptimal; and (2) they introduce integer variables, 


rendering integer linear programming (ILP) formulations. 


ILP 1 solves a Blotto defense game for the defender’s marginals (y,,...,y,), 


where y; represents the probability that j defenders are used in any given area. Therefore, 


Diy, = EG). 
k 
ILP 1: min v=nc+df 
y, ©, 
&§ 
st YA, j)-y,Sctd-i; 1=01..,f 

j=0 

é . 

dy; i<gin 

j=0 


& 
2 a! 
j=0 
yeount; =Nn-y,; 
y,2 0, forall 7 =0,1...., g 
ycount , € {0,...,g} and d20 
ILP 2 solves a Blotto attack game for the attacker’s marginals (x,,...,x,). The 
integer restrictions or ycount; and xcount; are used to require playability. 


ILP 2: max v=na—bg 


. 
st ) AG, f)-x,2a-b-j;  j=O,l..g 


i=l 


xcount, =Nn- x, 
x,20, forall i=0,1..., f 
xcount, € {0,..., f} and b= 0 


These playability constraints are too restrictive; they are sufficient to enforce 


playability but they are provably not necessary. 


4. New Fictitious Play Procedure 

We derive the dynamic programming recurrence relation for solving Blotto games 
with FP, using a general payoff function A, (x, y), which is the amount player 2 pays to 
player 1 when player | allocates x units to area k and player 2 allocates y units to area k. 
The total payoff is obtained by summing the rewards over the n areas. The recurrence 
can be solved without explicitly keeping track of every attack or defense played; rather, 
the information required is simply the number of times a given force level (number of 
attackers or defenders) has been used in each area, over all attacks and defenses seen so 


far. 
We first consider the defender’s problem at FP iteration K, which is to allocate g 
defenders over n cities to minimize the expected payoff, given that K attacks have been 


observed so far. Each attack can be represented by a column vector a‘ =(a\,a},...,a")", 


k=1,...,K . We define the values 7’ = ‘ Lp , Where I hen represents the indicator 


variable for the event, “the k'" attack used j attackers in area i.” Therefore, r/ represents 


the number of times exactly j attackers have been used against area i. We first determine 
the value of placing g defenders optimally in area n. Then we define a recurrence 


relation on a value function v,(p) that represents the expected payoff of placing p 
defenders optimally in areas i, i+1,..., n. Thenv,(g) is the solution to the original 


problem. The boundary condition is given by 


f 
V,(Q=—= > 1?A, (PQ), 7 =0,--585 (1) 
p=0 


mA | 


which is the total expected payoff when the defender uses g defenders in area n. This 
states that the optimal defender strategy when only area n remains is to allocate all q¢ 


remaining defenders to that area. The recurrence for i € {1,...,2—1} is: 


f 
min {Soe Ar D+vala~ i (2) 


10 


This is the expected payoff in area i plus the expected payoff generated by placing 
the remaining (qg-j) defenders optimally in areas i+l,..., n. The optimal defender 
allocation to area i is the value of j minimizing equation (2). 

Similarly, for the attacker’s problem, we assume that K defenses have been 
observed so far, where the k" defense isd‘ =(d‘,d5,...,d")’. The attacker wishes to 
allocate f forces over the n areas to maximize the expected payoff. Let s/ = me Ges be 
the number of times j defenders are placed in area i. Define w,(p) as the maximum 
possible expected payoff in areas i,...,n. The boundary conditions are: 

1 g 
w,(P) == 251 (PO): p=0,....f, (3) 


q=0 


and the recurrence is given by: 


> q=0 


m()=E max Yost AG Dp (4) 


The optimal solution for the attacker is represented by w, ( f ) and the corresponding 


decisions j maximizing (4) for each area. 

Blotto games can be extended immediately to the case where the attacker 
possesses different numbers of, say, two types of attacking units, f; and fs, and the 
defender also has a supply of, say, two types of defending unit, g; and g2. The payoff 
function now depends on the number of attackers and defenders of each type allocated to 
each area: A,(p,,P),4).4,)- If we define 77” as the number of times j; attackers of type 
1 and j. of type 2 have been used in area i, and similarly for s/”", then our value 
functions are two-dimensional, with boundary conditions 


1 fA DD 
"(Wa=— dD > PPA, (Pe Pn died) (5) 


P\=0 pr=0 


and 


11 


W, (P,» Ps) = EL Darr, (Pys Poo) » (6) 


1=9 q,=0 


and recurrences 


1 fl fh : 
ViGisG2) =F min | > EPPA Dis Dis Ji ds) + Pia -intsA)p (7) 
ae ie a ; P,=0 p»=0 


and 


1 1 2 
Wi(Pis Pad == a {3 Ss indo) jis Po - i} (8) 


Ke =0,...,p. 4H 0 q.=0 


Clearly, the size of the static space grows with the product of the number of each type of 


attacker or defender. It is still manageable with just a few types of attacker or defender. 
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i. DATA ANALYSIS AND RESULTS 


A. MODEL IMPLEMENATION 

The FP model is implemented in MATLAB (Version 6.5). The ILP solution 
procedure is implemented in GAMS (Revision 135, XA solver). Computations are done 
on a 1.5 GHz Intel Centrino-based laptop computer with 512 MB of RAM. All computer 
code appears in Appendices A and B. 


B. NUMERICAL RESULTS 

iF Rate of Convergence of FP 

Define gap(k) to be the difference between the upper and lower bounds on the 
value of the game at FP iteration k. Consistent with earlier FP studies (Washburn, 2001), 
we find that the FP gap plotted against number of iterations is approximately 


asymptotically linear on a log-log plot. That is, for large enough k, 
gap(k) * % » » oF 


log(gap(k)) = log(a) —blogk, 


where k is number of iterations and a and b are fitted constants. Limited numerical 
experimentation suggests that using a least square fit and dropping first 100 iterations the 
intercept (log(a)) increases with increasing f or g, and the slope (-b) increases (to 
approximately -1/2) with increasing n. These observations are illustrated in Figures 2, 3 


and 4. 
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Case f &n peas Slope Intercept BSE 
Attacker Defender (UES UE) 
Al 50, 200, 10 12565671261 1.76081E+15 | -0.48374 1.8887 3.8396 
A2 25, 100, 10 52451256 | 4.26342E+12 | -0.48378 1.5860 1.5260 
A3 5, 20, 10 2002 10015005 | -0.48376 0.8887 0.3052 





log(gap) 


Figure 2. 


the number of iterations of fictitious play, on log-log scale. The slope of the fitted line 


(from the column labeled “slope” in the table above the plot) indicates the rate of 





¢ At-f50,g=200 
Predict 





Al 


‘Sd A3 


— Predi 








With n fixed, the slope remains constant and the intercept increases with f 


log(# Iterations) 


and g. 


In Figure 2 we see the gap between the upper and lower bounds plotted against 


convergence. Note that the slope is constant as fand g increase, and n remains fixed. 
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Figure 3. 


B2: n=20 
Predicted B2 
B3: n=15 


Predicted B3 
B4: n=10 
Predicted B4 


The best-fit slope increases with n (f and g fixed). 


1.5 


log( # Iteration) 


2.5 


re ae # pure strategies Slope Intercept Final Gap Standard 
Attacker Defender (a) © ER a) Euot 
BI | 30, 50, 30 5.91E+16 3.33E+21 -0.517 1.8541 2.0374 0.0094 
B2 | 30, 50, 20 1.89E+13 4.63E+16 -0.543 1.8954 1.7824 0.0157 
B3 | 30, 50, 15 1.15E+11 4.79E+13 -0.580 1.9463 1.5238 0.0187 
B4 | 30, 50, 10 2.12E+08 1.26E+10 -0.588 1.8839 1.1999 0.0270 
BS | 30,50, 5 46376 316251 -0.851 2.1746 0.4787 0.0469 
B6é | 30,50, 2 31 51 -1.093 1.4431 0.0216 0.0194 
2 
"fe. 
1.5 * a, a=-0.543 (B2) 
Sr re 
+e) ed a=-0.580 (B3) 
$bits, Mee, 
> a a=-0.588 (B4) 
ts] 
= 4 
a 
2 





If we increase n for a fixed f and g, we see in Figure 3 that the slope increases, 


and appears to approach a limit of -0.5 (Figure 4). This is consistent with conjectures of 


1 convergence of FP. 
An : 


|e 








With f and g fixed, the asymptotic 
slope increases to approximately 
1.5, ae -0.5 as n becomes large. 


log(Gap) 
°o 


© 
a 


— Predicted B1: n=30 
—— Predicted B2: n=20 











“A — Predicted B3: n=15 
—— Predicted B4: n=10 
1.5 —— Predicted B5: n=5 
— Predicted B6: n=2 
-2 
log( # Iterations) 
Figure 4. Convergence of asymptotic slope 


2. Elapsed Time per FP Iteration 
We observe that for all tested values of f, g and n, the elapsed time per FP 
iteration is constant as the number of FP iterations k increases. This is illustrated in 


Figure 5 and occurs because the amount of FP data required to be manipulated and stored 


does not increase with k. 
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Iteration 1000 | 2000} 3000) 4000} 5000} 6000; 7000} 8000] 9000 | 10000 
fen Elapsed time (sec) 
5, 6, 10 2.4 4.6 6.8 9.0 11.2 13.3 15.5 17.7 19.9 22.2 
20, 25, 30 24,3 48.4 72.6 96.7 | 121.4} 145.5 | 169.1 | 193.8 | 217.6 | 242.0 
40, 45, 50 81.7 | 163.4} 245.2 | 326.4 | 409.2 | 490.4 | 572.2 | 653.1 | 736.6] 817.4 
900 
800 - -@- -f=5, g=6, n=10 
® 700 —a— f=20, g=25, n=30 
“9 —a—f40, g=45, n=50 
= 600 
= 
> +000 
r 
a. 400 
o 
> 300 
= 200 
°o 
_ 


100 
0 


1000 


2000 


3000 


Figure 5. 


4000 


FP iterations vs. Elapsed time for 3 games 


5000 
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6000 


7000 
Total FP Iterations, 


8000 


9000 





10000 





3. Comparisons with FP and ILP Procedure 
Comparisons are made between the ILP and FP solution procedures. Three 
different payoff functions are examined. 
a. Convex Payoff Function 


The convex payoff function is given by 


A(x;, y;) =max(x, — y,,0). 


Figure 6 shows how the two procedures performed. 










































































FP (k = 2000) ILP 
Run fen Elapsed Upper eee Gap Elapsed Value of 
time time game 
1 35 4505 1.962 2.208 2.179 0.029 0.10 22 
2 6, 8,5 2.774 4.415 4.358 0.057 0.18 4.4 
3 12, 16, 5 4.657 8.831 8.717 0.114 0.22 8.8 
4 15, 20, 5 5.668 11.039 10.896 0.143 0.19 11 
a 30, 40, 5 11.147 22.077 21.792 0.285 0.25 22 
6 60, 80, 5 24.535 44.154 43.584 0.570 0.12 44 
7 120, 160, 5 59.766 88.308 87.168 1.140 0.21 88 
8 150, 200, 5 83.801 | 110.385 | 108.960 1.426 0.12 111 
90 1.6 
Gay Elapsed time of ILP 
e [= Lapsed time of FP 1.4 
70 —e— Gap of FP 12 
® 60 
a 1.0 
= 50 
= 0.86 
® 40 
a 0.6 
2 
i 30 
20 0.4 
10 0.2 
0 | i 0.0 
1 2 3 4 5 6 7 8 
Run 
Figure 6. Comparison of FP and ILP procedures with convex payoff function 
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b. Capacitated Payoff Function 
The capacitated payoff function is 


max (x, — y,,0), x, -—y, Scap 


Alsnx) =| 


where cap is the maximum possible payoff. We note that the ILP objective function need 
not be either convex or concave. Figure 7 shows how the two procedures performed. 


> 
cap, xy, > cap 



























































FP (k = 2000) ILP 
Run ae Elapsed Elapsed 
cen Upper | Lower | Gap Gos Upper | Lower | Gap 
1 3, 4,5 1.743 | 2.208 | 2.179 | 0.029 0.180 | 2.200 | 2.200] 0.000 
2 6, 8,5 2.704 | 2.958 | 2.872 | 0.086 0.100 | 3.120 | 2.800] 0.320 
3 12, 16,5 4.697 | 3.935 | 3.767 | 0.168 0.100 | 4.114} 3.800) 0.314 
4 15, 20,5 5.748 | 4.252 | 4.078 | 0.174 0.170 | 4.333 | 4.000] 0.333 
5 30, 40, 5 11.226 | 4.884) 4.619 | 0.265 0.190 | 5.143 | 4.200] 0.943 
6 60, 80,5 | 24.536} 5.350) 4.960 | 0.389 1.420 | 5.807 | 4.200) 1.607 
z 120, 160,5 | 59.946 | 5.622} 5.130] 0.492 | 62.380 | 6.250} 4.200] 2.050 
8 150, 200,5 | 82.278 | 5.713 | 5.144 | 0.570 | 102.170 | 6.338 | 4.091 | 2.247 
120 25 
Ga lapse time of ILP 
[== Lapse time of FP 
= —— Gap of ILP 20 
—+e— Gap of FP 
oe 80 
oO 
vn 1.5 
oO 
= 60 o 
(0) Oo 
ot 1.0 
Oo 
i 40 
0.5 
20 
0 0.0 
1 2 3 4 5 6 7 8 
Run 
Figure 7. Comparison of FP of ILP procedures with the capacitated payoff function 
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c. Binary Payoff Function 
The binary payoff function is 


0; 2.=y 20 


Alsnx)={t 


As with the capacitated payoff function, the ILP in this case need not be either convex or 
concave. Figure 8 shows how the two procedures performed. 


ay, >0 





FP (k = 2000) nie 
Elapsed 
time 





Run f gn Elapsed 
time 
1 3, 4,5 1.021 | 1.216] 1.179 | 0.037 0.12} 1.200 | 1.200 | 0.000 
2 6, 8,5 2.805 | 1.477 | 1.407] 0.070 0.17] 1.600 | 1.400 | 0.200 
3 12, 16,5 4.757 | 1.675 | 1.574] 0.101 0.19) 1.920 | 1.400 | 0.520 
4 15, 20,5 5.668 | 1.721 | 1.618 | 0.103 0.11} 2.000 | 1.333 | 0.667 
5 30, 40,5 11.186 | 1.850 | 1.694] 0.156 0.30 | 2.000 | 1.400 | 0.600 
6 

fi 

8 


Upper | Lower | Gap Upper | Lower | Gap 




















60, 80,5 24.756 | 1.921} 1.716} 0.205 2.57 | 2.167 | 1.400 | 0.767 
120, 160, 5 60.758 | 1.990 | 1.703 | 0.287 61.96 | 2.182 | 1.400 | 0.782 
150, 200, 5 83.260 | 2.012] 1.686] 0.326] 149.89] 2.214 | 1.400 | 0.814 


























































160 Gay Elapse time of ILP 0.9 
[= Hapse timeof FP 
140 —<— Gap of ILP 0.8 
—~e— Gap of FP 
120 OF 
o 0.6 
4 100 
0.5 
= g0 o 
© 0.40 
ao 60 
®& 0.3 
a 40 
0.2 
20 0.1 
0 0.0 
1 2 3 4 5 6 7 8 
Run 


Figure 8. Comparison of FP and ILP procedures with the binary payoff function 
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C. CONCLUSIONS 
As has been observed in earlier FP studies the FP gap (the difference between the 
upper and lower bounds or game value) as a function of number of FP iterations, k, is 


approximately 


gaplk)= Y,, 


for large enough k. 


The best-fit a increases with f and g, and the best-fit b decreases to approximately 


1/2 with increasing n. 


Because of efficiencies realized in the DP procedure used to solve the FP 
subproblems, the computation time required for each FP iteration is approximately 


constant as the number of FP iterations increases, for fixed f, g and n. 


For the convex payoff function tested, the ILP formulation solved with GAMS 


was faster and more accurate than the FP procedure. 


For the non-convex payoff functions tested, the FP procedure was more 


competitive and sometimes significantly outperformed than ILP procedure. 
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IV. CONCLUSIONS AND FURTHER STUDY 


We propose a new efficient fictitious play (FP) procedure to solve two-person 
zero-sum Blotto games. The algorithm uses dynamic programming (DP) to solve the FP 
subproblems at each iteration. By representing intermediate mixed strategies through 
marginal distributions are keep the state space of the DP manageable and independent of 
the number of iterations. Although our experiments considered one type of attacker and 
one type of defender, we indicate how to generalize this procedure to cases with more 


than one type of attacker or defender (or both). 


During this study, we identified other topics for further investigations. The first is 
to investigate generalizations of Blotto games in which defenders can be placed in such a 
way as to defend multiple areas at once. This is closer to the real situation with missile 
defense. The second is to explore the issue of playability in the ILP formulations. Our 
proposed playability constraint is currently too restrictive; we have provided examples in 
which the optimal solution to the ILP, with the playability constraint, is not equal to the 
value of the game. Further research should explore less restrictive, alternate formulations 
of playability constraints. It is possible (although unlikely) that less restrictive playability 
constraints would also yield more efficiently solvable ILP, making that approach 


competitive with the DP-based procedure for larger problems. 
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APPENDIX A. MATLAB CODE FOR NEW FP PROCEDURE 


1. Blotto_fp.m 





% Blotto Game Fictitious Play Solution with general payoff matrix 


n= areas to be attacked and defended 
6 f = attackers to be allocated to the n areas 
g 











+ defenders to be allocated to the n areas 























6 Initialize with a vector of marginals for the attacker and 
% defender. 

6 initialize A for attacker (each row is frequency of 

& 0,1,2. ... ,f£ being assigned to area i=1l,...,n), 

6 initialize D for defender (each row is frequency of 

& 0,1,2, ... ,g being assigned to area i=l,...,n), 

6 initalize upper and lower bounds on the value of the game, and 
& number of iterations desired. 

& Calculate P(a,d) = return if a attackers and d defenders 

& attack any area 

clear; 

if 150; & number of attackers 

g = 200; % number of defenders 

n = 5; %& number of areas to attack and defend 


oe 


num_its = 2000; number of FP iterations 
for i= 1:f+t1 
for j= l:g+l 
P(i,j)=linearpayoff (i,j); 
P (i, j)=cappedpayoff (i,j); 
P(i, j)=binarypayoff (i,j); 


le 


ole 


end 
end 


& Set initial A for all attacks in area 1 


A(1,:) = [zeros(1,f),1]; 
for k=2:n 

A(k,:) = [1,zeros(1,f)]; 
end 


% Set initial D to all defenses in area 1 


D(1,:) = [zeros(1,g),1]; 
for k=2:n 
D(k,:) = [1,zeros(1,g)]; 
end 
v_up =f % assumes all missiles are leakers 


, 
v_low = 0; % assumes no missiles are leakers 
% preallocates v_up and v_low in memory 
v_up=v_up*ones(1,num_its+1); 


2 


v_low=v_low*ones (1,num_itstl1); 


for k=l:num_its 
[opt_atk,v_u] = attacker(D,P,n,f,g); 
Find best pure attack for defenses seen so far 
(and upper bound). 
[opt_def,v_l] = defender (A,P,n,f,g); 
Find best pure defense for attacks seen so far 
(and lower bound). 
A = update_attack_marginals (A, opt_atk) ; 
SUpdate attack and defense marginals. 
D = update_defense_marginals (D, opt_def) ; 
v_up(k+1) = min(v_u,v_up(k)); 


ole 


le 


ae 


ole 








v_low(k+1)= max(v_l1,v_low(k)); 
if (k/100 == floor(k/100)), home, k, gap= (v_up (k+1) - 
v_low(k+1)), end 
end 
figure (1) 
loglog([O:num_its], (v_up - v_low),'ro'),grid on, title(' (upper 
bound - lower bound) vs. #FP iterations') 
figure (2) 
plot ([O:num_its],v_up,'go', [O:num_its],v_low, 'rx'),grid on, 
axis([0 num_its 0 f]),title('Upper and lower bounds vs. #FP 
iterations') 
bounds = [v_up(end),v_low(end) ] 
meanA = mean(A)/num_its 
meanD = mean(D)/num_its 
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2. Attacker.m 


function [opt_attack,v_up] = attacker(D,P,n,f,g) 


& Takes defenses D and returns optimal attack column vector and 
% an upper bound on the value of the Blotto game. 

6 n areas, f attackers, g defenders 

% Uses general payoff function P(a,d) 

% #defenses (increases with FP iterations) 


num_defenses = sum(D(1,:)); 
s=D*P'; % compute s(i,j) = exp. 1-step payoff 


% when j attackers assigned to area i. 





% Uses the marginal matrix 

& D(i,j) = # times j defenders assigned to area i. 

v_star = zeros(n,f+1); Sinitialize v_star(i,j) = optimal 
% return when j attackers are available for i areas 
a_star = zeros(n,f+1); Sinitialize a_star(i,j) = optimal 


& # attackers to use when j attackers are available for i areas 


en = 1 
v_star(1l,:) = s(1,:); % optimal payoff when 1 area is included 
a_star(1,:) = [0:f]; % optimal #attackers when 1 area is included 
S$n> 1 
for i=2:n % areas 
for j=0:f 6 jJ attackers remain to be used 
v_starnew = zeros(1,j+1); Sinitialize a_starnew 
for k=0:j 
v_starnew(k+1) = s(i,k+1) + v_star(i-1,j-k+1); 
% enumerating possible returns 
end 
[v_star (i, j+1),a_star(i,j+1)] = max(v_starnew) ; 
% identifying the # attackers giving a max. payoff 
a_star(i,j+1l) = a_star(i,jt1)-1; 
% correcting for col. 1 being for 0 attackers 
end 
end 


v_up = v_star(n,f+1)/num_defenses; 


opt_attack = zeros(n,1); % initialize opt_attack 
opt_attack(n) = a_star(n,f+1); 





fe) 


% establish optmal #attackers to use for n areas 


attackers_remaining = f -— opt_attack(n)j; 


fe) 


% update #attackers remaining for remaining n-1l areas 


for i=n-1:-1:1 
opt_attack(i) = a_star(i,attackers_remaining+l1); 
attackers_remaining = attackers_remaining - opt_attack(i); 
end 
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3. Defender.m 


function [opt_defense,v_low] = defender (A,P,n,f,g) 


& Takes attacks A and returns optimal defense column vector and 
an lower bound on the value of the Blotto game. 

n areas, f attackers, g defenders 

% Uses general payoff function P(a,d) 


oe 


ol? 


num_attacks = sum(A(1,:)); 


q=A*P; % compute q(i,j) = exp. 1-step payoff 
% when j attackers assigned to area i. 


fo) 


% Uses the marginal matrix 














& A(i,j) = # times j attackers assigned to area i. 
r_star = zeros(n,gt1l); Sinitialize r_star and d_star 
d_star = zeros(n,gtl); 
$ne=1 
r_star(1,:) q(1,:); 
d_star(1,:) = [0O:g]; 
Boh Sd 
for i=2:n 
for j=0:g 
r_starnew = zeros(1,j+1); Sinitialize r_starnew 
for k=0:3 
r_starnew(k+1) = q(i,k+1) + r_star(i-1, j-k+1); 
% enumerating possible defenders 
end 
[r_star (i, j+1),d_star(i,j+1)] = min(r_starnew) ; 
% identifying the # defenders giving a min. payoff 
d_star(i,jt+l) = d_star(i,jt+1)-1; 
% correcting for col. 1 being for 0 defenders 
end 
end 


v_low = r_star(n,gt1)/num_attacks; 

opt_defense = zeros(n,1); 

opt_defense(n) = d_star(n,gtl1); 

defenders_remaining = g -— opt_defense(n); 

for i=n-1:-1:1 
opt_defense(i) = d_star(i,defenders_remaining+1l) ; 
defenders_remaining = defenders_remaining - opt_defense(i); 





end 
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4. Update_attack_marginals.m 


function [A_new] = update_attack_marginals (A, opt_atk) 
[n,m]=size(A); 
A_new=A; 
for k=l1:n 
z=opt_atk (k); 
A_new(k, z+1)=A_new(k,z+1)+1; 











End 
5s Update_defense_marginals.m 
function [D_new] = update_defense_marginals (D, opt_def) 
[n,m]=size(D); 
D_new=D; 
for k=l:n 
z=opt_def (k); 
D_new (k, z+1)=D_new(k, z+1)+1; 
end 


6. Payoff.m 


Unconstrained target case: 


Linearpayoff.m 
function P=linearpayoff (a,d) 


if a>d 
P=a-d; 
else 
P=0; 
end 


Capacitated target case: 





Cappedpayoff.m 
function P = cappedpayoff(a,d, cap) 
P=max (0,min(a-d, cap) ); 


Binary target case: 





Binarypayoff.m 
function P = binarypayoff (a,d) 
P=(a>d); 
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APPENDIX B. GAMS CODE FOR ILP PROCEDURE 













































































*  Blotto Attack & Defense game 
* Attacker is the row player and maximizer. 
* Defender is the column player and minimizer. 
* Uses integer restrictions to require playability 
Sofflisting 
Sinlinecom { } 
OPTIONS 
SOLPRINT = OFF, 
DECIMALS = 2, 
LIMCOL = 0, 
LIMROW = 0, 
RESLIM = 86400, {MAX SECONDS} 
ITERLIM = 100000, {MAX PIVOTS} 
OPTCA = O20, {ABSOLUTE INTEGRALITY TOLERANCE } 
OPTCR = 0.00, {RELATIVE INTEGRALITY TOLERANCE } 
lp = xa, 
MIP = xa; {XA Solver} 
SCALARS 
N total number of areas f5. of 
F total number of attackers /15 / 
G total number of defenders /20 / 
Time execution time /O / 
, 
SETS 





i # attackers used /Oattack*1l5attack/ 
j # defenders used /Odefend*20defend/ 


- 











PARAMETER P(i,j) damage done when i attack and j defend; 





LOOP ((i, 3), P(i,j)=max((ord(i)-ord(j)),0)); {convex payoff} 


*LOOP ((1,3), P(i,3)=min(P(i,j),3)); {capacitated payoff} 





*LOOP ((i,3), if (P(i,3)>0, 








P(i,j)=1; 
else 
P(i,j)=0;) 
); {binary payoff} 
VARIABLES 
vl value of game for defenders 
v2 value of game for attackers 


x(i) Attacker marginal distribution on # attackers used 
y(j) Defender marginal distribution on # defenders used 
xcount (i) Attacker marginal dist. * N 
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{= # times i attackers used} 
ycount (j) Defender marginal dist. * N 
{= # times j defenders used} 
-intercept 





x 
slope 
y-intercept 
s 


lope; 





aan»oew 


POSITIVE VARIABLES x, y, b, d ; 





x 











INTEGER VARIABLES xcount, ycount ; 
*POSITIVE VARIABLES xcount, ycount ; 


























EQUATIONS 
objectivel objective funtion 
expreturn_y (i) expected return 
meandefenders constraint on mean number of defenders 
probsum_y sum of probabilities is 1 
i extraconstraintO extra contraits to help explore 
different optimal solutions 
ycountdef (j) definition of ycount 


, 


objectivel.. vl =e= N*c + d*F ; 





expreturn_y(i).. sum(j, P(i,3)*y(j)) =l= ctd* (ord(i)-1); 
meandefenders.. N*sum(j,y(j)*(ord(j)-1)) =l=G; 
probsum_y.. sum(j,y(j)) =e= 1 ; 

*extraconstraint0O.. y('l0defend') =e= .25 ; 
ycountdef(j).. ycount(j) =e= N*y(4) ; 


MODEL Defense /objectivel, 
expreturn_y, 








meandefenders, 
probsum_y 
ycountdef 
/; 
EQUATIONS 
objective2 objective funtion 
expreturn_x (J) expected return 
meanattackers constraint on mean number of attackers 
probsum_x sum of probabilities is 1 
xcountdef (i) definition of xcount 
, 
objective2 .. v2 =e= N*a - b*G ; 
expreturn_x(j) .. sum(i, x(i)*P(i,j3)) =g= a-b* (ord(4j)-1); 
meanattackers .. N*sum(i,x(i)*(ord(i)-1)) =l= F ; 
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probsum_x .. sum(i,x(i)) =e= 1 ; 


xcountdef(i).. xcount(i) =e= N*x(i) 





MODEL Attack /objective2, 
expreturn_x, 


meanattackers, 


probsum_x 
xcountdef 


SOLVE Defense using mip minimizing vl; 














SOLVE Attack using mip maximizing v2; 














Time=Time+Defense.resusdtAttack.resusd; 


DISPLAY vl.1l, v2.1, y.1, x.] 











DISPLAY ycount.1,xcount.1, 





Time; 
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